HYDRAstor: A Scalable Secondary Storage
نویسندگان
چکیده
HYDRAstor is a scalable, secondary storage solution aimed at the enterprise market. The system consists of a back-end architectured as a grid of storage nodes built around a distributed hash table; and a front-end consisting of a layer of access nodes which implement a traditional file system interface and can be scaled in number for increased performance. This paper concentrates on the back-end which is, to our knowledge, the first commercial implementation of a scalable, high-performance content-addressable secondary storage delivering global duplicate elimination, per-block user-selectable failure resiliency, selfmaintenance including automatic recovery from failures with data and network overlay rebuilding. The back-end programming model is based on an abstraction of a sea of variable-sized, content-addressed, immutable, highly-resilient data blocks organized in a DAG (directed acyclic graph). This model is exported with a low-level API allowing clients to implement new access protocols and to add them to the system on-line. The API has been validated with an implementation of the file system interface. The critical factor for meeting the design targets has been the selection of proper data organization based on redundant chains of data containers. We present this organization in detail and describe how it is used to deliver required data services. Surprisingly, the most complex to deliver turned out to be on-demand data deletion, followed (not surprisingly) by the management of data consistency and integrity.
منابع مشابه
HydraFS: A High-Throughput File System for the HYDRAstor Content-Addressable Storage System
A content-addressable storage (CAS) system is a valuable tool for building storage solutions, providing efficiency by automatically detecting and eliminating duplicate blocks; it can also be capable of high throughput, at least for streaming access. However, the absence of a standardized API is a barrier to the use of CAS for existing applications. Additionally, applications would have to deal ...
متن کاملImpact of Data Organization on Distributed Storage System
With the explosive growth of data stored in digital format, there is a need for a new approach to data storage. Large amount of stored data requires modern storage systems to be scalable and easily extendable on-line. Moreover, the data must be resilient and highly available, which in turn requires failure-tolerant and highly available storage. To address these needs a new storage segment calle...
متن کاملResource Allocation in Selfish and Cooperative Distributed Systems
In this dissertation we take an algorithmic view on resource allocation problems in distributed systems. We present a comprehensive perspective by studying a variety of distributed systems—from abstract models of generic distributed systems, through more specific and detailed models, to real distributed computer systems. These systems differ with respect to the nature of the resource allocation...
متن کاملAn Efficient Secret Sharing-based Storage System for Cloud-based Internet of Things
Internet of things (IoTs) is the newfound information architecture based on the internet that develops interactions between objects and services in a secure and reliable environment. As the availability of many smart devices rises, secure and scalable mass storage systems for aggregate data is required in IoTs applications. In this paper, we propose a new method for storing aggregate data in Io...
متن کاملScalable Services for Video-on-demand
Video-on-demand (VOD) refers to video services in which users can request any video program from a server at any time. VOD has important applications in entertainment, education, information, and adverstising, such as movie-on-demand, distance learning, home shopping, interactive news, etc. In order to provide VOD services accommodating a large number of video titles and concurrent users, a VOD...
متن کامل